Word Sense Disambiguation By Human Subjects: Computational And Psycholinguistic Applications

نویسندگان

  • Thomas E. Ahlswede
  • David Lorand
چکیده

Although automated word sense disambiguation has become a popular activity within computational lexicology, evaluation of the accuracy of disambiguation systems is still mostly limited to manual checking by the developer. This paper describes our work in collecting data on the disambiguation behavior of human subjects, with the intention of providing (I) a norm against which dictionary-based systems (and perhaps others) can be evaluated, and (2) a source of psycholinguistic information about previously unobserved aspects of human disambiguation, for the use of both psycholinguists and computational researchers. We also describe two of our most important tools: a questionnaire of ambiguous test words in various contexts, and a hypertext user interface for efficient and powerful collection of data from human subjects. 1 The need for a metric of disambiguation Research in automatic lexical disambiguation has been going on for decades, and in recent years experimental disambiguation systems have proliferated. The problem of determining the accuracy of these systems has been little recognized: the usual check for correctness is a comparison of the test results against the experimenter's own judgment. Even less considered has been the question of what constitutes correctness in disambiguation, beyond the intuitive recognition that some disambiguations are better ("correct") and others worse ("incorrect"). A common approach to disambiguation is to select among the homographs and senses provided by a machine-readable dictionary (e.g. Lesk [1986], Byrd [1989], Krovetz [1989], Slator [1989], Guthrie et al. [1990], Ide and Veronis [1990], and Veronis and Ide [1990]. Dictionaries deal with the ambiguity of words by providing multiple definitions for sufficiently ambiguous words. These multiple definitions may be homographs (distinct words of unrelated meaning, whose written forms coincide) or senses (related but nonidentical meanings of a single word). The inadequacy of a finite, discrete set of sense definitions to resolve all ambiguities has been pointed out by Boguraev and Pustejovsky [1990], Kilgarriff [1991], and Ahlswede [forthcoming]. For the practical task of disambiguation in natural language processing, however, the dictionary is a valuable and convenient source of sense distinctions; in our view, the best single source. 2 Evaluations of Human and Automat ic Disambiguat ion Many previous studies of human disambiguation have been from a psycholinguistic point of view. Simpson and Burgess [1988], surveying some of these studies, identify three basic models of ambiguity processing: (1) restriction by context, (2) ordered access, and (3) multiple access. Prather and Swinney [1988] consider whether the lexical component of human language processing is modular, i.e., acts independently of other components, or whether it interacts with other components. Computationally oriented evaluations of human disambiguation began as incidental adjuncts to computational projects. Amsler and White [1979], with the help of assistants, manually (i.e., by human judgment) disambiguated the nouns and verbs used in definitions in the Merriam-Webster Pocket Dictionary. In an informal study, they found that their disambiguators' self-consistency on repeat performance was high (84%) but their consistency with respect to each other was lower. The need for some means of evaluating automatic disambiguation methods, more rigorous than the experimenter's personal judgment, has become more obvious with the recent growing interest in the topic. Gale, Church and Yarowsky [1992], for instance, have followed the approach of estimating upper and lower bounds on the performance of a system. 3 Prel iminary exper iments The project described in this paper began when one of us (Ahlswede) wrote disambiguation programs based on those of Lesk [1986] and Ide and Veronis [1990] for application in dictionary and corpus research. Lesk claimed 50-70% accuracy on short samples of literary and journalistic input. Ide and Veronis claimed a 90% accuracy rate for their program, although they explained that they had tested it against strongly distinct definitions mainly homographs rather than senses. After running the programs on test data containing ambiguities at both homograph and sense level, and evaluating the results, Ahlswede doubted whether, given this subtler mix of ambiguities, even a single human judge would achieve 90% consistency on successive evaluations of the same output; moreover, the consistency among multiple judges might well be much lower. Ahlswede recruited seven colleagues and friends to evaluate the test data, then compared their disambiguations of the test data against each other. The level of agreement averaged only 66% among the various human informants, ranging from 31% to 88% between pairs of informants [Ahlswede, forthcoming]. This figure was based on a simple pairwise comparison strategy. The informants rated each sense definition of a test word with a "1" indicating that it correctly represented the meaning of the word as used in the test text; "-1" if the definition did not correctly represent the meaning; and "0" if for any reason the informant could not decide one way or the other. Pairs of informants were then compared by matching their ratings of the sense definitions of each word. The pair were considered to agree on a test word if at least one sense received a "1" from both informants and if no sense receiving a "1" from either informant was given a "-1" by the other. This scoring method had the advantage of simplicity, but it did not reflect the agreement implicit in the rejection as well as the selection of senses by both informants. But the relative weight of common rejections and common selections among the senses of a given test word depends on the total number of senses, which varies widely. No discrete-valued scoring mechanism seems able to solve this problem. A pairwise scoring procedure that gives much more plausible results is the coefficient of correlation, applied to the parallel evaluations by the informants being compared. It clearly distinguishes the relatively high agreement expected from human subjects from the relatively low agreement predicted for primitive automatic disambiguation systems, and from the more or less random behavior of a control series of random "disambiguations." Table 1. Pairwise correlations of performance of human, machine and control disambiguations of test texts hl h2 h3 h4 h5 h6 h6a h7 ml mla m2 al

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Word Sense Disambiguation and Human Intuition for Semantic Classification on Homonyms

This paper reports a psycholinguistic research for the human intuition on the sense classification. The goal of this research is to find a computational model that fits best with our experiments on human intuition. In this regard, we compare three different computational models; the Boolean model, the probabilistic model, and the probabilistic inference model. We first measured the values of ea...

متن کامل

Knowing a word (sense) by its company

Supervised word sense disambiguation requires training corpora that have been tagged with word senses, and these word senses typically come from a pre-existing sense inventory. Space limitations imposed by dictionary publishers have biased the field towards lists of discrete senses for an individual lexeme. This approach does not capture information about relatedness of individual senses. How i...

متن کامل

Unsupervised and Minimally Supervised Learning of Lexical Semantics Proceedings of the Workshop

Supervised word sense disambiguation requires training corpora that have been tagged with word senses, and these word senses typically come from a pre-existing sense inventory. Space limitations imposed by dictionary publishers have biased the field towards lists of discrete senses for an individual lexeme. This approach does not capture information about relatedness of individual senses. How i...

متن کامل

Word Sense Disambiguation-A Survey

Word sense disambiguation (WSD) is a linguistically based mechanism for automatically defining the correct sense of a word in the context. WSD is a long standing problem in computational linguistics. A particular word may have different meanings in different contexts. For human beings, it is easy to extract the correct meaning by analyzing the sentences. In the area of natural language processi...

متن کامل

Choosing Sense Distinctions for WSD: Psycholinguistic Evidence

Supervised word sense disambiguation requires training corpora that have been tagged with word senses, which begs the question of which word senses to tag with. The default choice has been WordNet, with its broad coverage and easy accessibility. However, concerns have been raised about the appropriateness of its fine-grained word senses for WSD. WSD systems have been far more successful in dist...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1993